Non Radiation Hardened Microprocessors in Spaced Based Remote Sensing Systems 


The CALIPSO (Cloud-Aerosol Lidar and Infrared Pathfinder Satellite 
Observations) mission is a comprehensive suite of active and passive sensors including a 
20Hz 230mj Nd:YAG lidar, a visible wavelength Earth-looking camera and an imaging 
infrared radiometer. CALIPSO flies in formation with the Earth Observing System Post- 
Meridian (EOS PM) train, provides continuous, near-simultaneous measurements and is a 
planned 3 year mission. CALIPSO was launched into a 98 degree sun synchronous Earth 
orbit in April of 2006 to study clouds and aerosols and acquires over 5 gigabytes of data 
every 24 hours. Figure 1 shows the ground track of one CALIPSO orbit as well as high 
and low intensity South Atlantic Anomaly outlines. CALIPSO passes through the SAA 
several times each day. 

Spaced based remote sensing systems that include multiple instruments and/or 
instruments such as lidar generate large volumes of data and require robust real-time 
hardware and software mechanisms and high throughput processors. Due to onboard 
storage restrictions and telemetry downlink limitations these systems must pre-process 
and reduce the data before sending it to the ground. This onboard processing and real- 
time requirement load may mean that newer more powerful processors are needed even 
though acceptable radiation-hardened versions have not yet been released. CALIPSO ’s 
single board computer payload controller processor is actually a set of four (4) voting 
non-radiation hardened COTS Power PC 603r’s built on a single width VME card by 
General Dynamics Advanced Infonnation Systems (GDAIS). 

Significant radiation concerns for CALIPSO and other Low Earth Orbit (LEO) 
satellites include the South Atlantic Anomaly (SAA), the north and south poles and 
strong solar events. Over much of South America and extending into the South Atlantic 
Ocean (see figure 1) the Van Allen radiation belts dip to just 200-800km and spacecraft 
entering this area are subjected to high energy protons and experience higher than normal 
Single Event Upset (SEU) and Single Event Latch-up (SEL) rates. Although less 
significant, spacecraft flying in the area around the poles experience similar upsets. 
Finally, powerful solar proton events in the range of lOMeV/lOpfu to lOOMeV/lpfu as 
are forecasted and tracked by NOAA’s Space Environment Center in Colorado can result 
in Single Event Upset (SEU), Single Event Latch-up (SEL) and pennanent failures such 
as Single Event Gate Rupture (SEGR) in some technologies. (Galactic Cosmic Rays 
(GCRs) are another source, especially for gate rupture) 

CALIPSO mitigates common radiation concerns in its data handling through the 
use of redundant processors, radiation-hardened Application Specific Integrated Circuits 
(ASIC), hardware-based Error Detection and Correction (EDAC), processor and memory 
scrubbing, redundant boot code and mirrored files. After presenting a system overview 
this paper will expand on each of these strategies. Where applicable, related on-orbit data 
collected since the CALIPSO initial boot on May 4, 2006 will be noted. 




(Figure 1) South Atlantic Anomaly 
System Overview 

The CALIPSO Single Board Computer (SBC) is a VME single slot General 
Dynamics Integrated Spacecraft Computer (GDISC) system. This board was chosen for 
several reasons including: processor performance and radiation tolerance. A functional 
diagram is shown in figure 2 and significant system characteristics are listed in table 1 
below. CALIPSO runs all 4 four system processors but at a reduced clock speed of 
160MHz in order support spacecraft platform power requirements. 


Processor 

Main Processor: PowerPC 603r (4), 240 MHz. 
CALIPSO runs at a reduced speed of 160Mhz 

Performance 

480 peak MIPS at 240Mhz 

Non-volatile 

memory 

16 MB Flash (EPROM), on-orbit programmable 
128KB (2) (EEPROM), storage for bootstrap 
code with SECDED, CALIPSO application 
supports verifying/updating on-orbit 

RAM system 
memory 

64MB SDRAM with Triple Error 
Correction/Quadruple Error Detection 
(TECQED) 

Interface 

memory 

4MB SDRAM with TECQED 

I/O 

MIL-STD-1553B 

Two RS-422 serial lines 

Three high-speed (lOMbits/sec) serial lines 


(Table 1) CALISPO Specific SBC Characteristics (Courtesy GDAIS) 




VME Bus 


(Figure 2) SBC Functional Block Diagram (courtesy of GDAIS) 

CALIPSO application code, developed by Ball Aerospace in Boulder, includes 
more than 12 interrupt service routines and over 40 tasks many of which run briefly 
and/or rarely. Several of the more processor-demanding CALIPSO flight software tasks 
are listed below in Table 2. The utilization values listed were acquired on-orbit while in 
nominal data acquisition mode. 


Task Name 

CPU % 

WFC RECEIVE TASK 

1.398 

HR RECEIVE TASK 

0.709 

PMC TARGET RANGE TASK 

2.080 

AD SPPS MON TASK 

1.067 

LDR LDP PROCESS 532P FRAME TASK 

22.097 

LDR LDP PROCESS 532S FRAME TASK 

21.715 

LDR LDP PROCESS 1064 FRAME TASK 

18.363 

WFC PROCESS TASK 

0.151 

MC MEM SCRUB TASK 

0.004 





Idle Task 

29.725 



Total CPU Utilization 

70.28% 


(Table 2) Significant CALIPSO Software Tasks 













Radiation Mitigation Strategies 


Redundant processors and radiation-hardened Application Specific Integrated Circuits 

As mentioned earlier CALIPSO uses a General Dynamics VME Single Board 
Computer with 4 non-radiation-hardened COTS PPC 603r’s running in strict lockstep. 
Address, data, and control outputs to memory and I/O are majority voted. If one of the 
members has been upset and is not working correctly, a voting set of three still provides 
correct outputs. Processors that mis-compare are disabled until they can be reset. This 
design allows the non-radiation hardened processors to be used while still providing 
mitigation of cosmic ray or proton-induced upsets. As shown in figure 3 below, if a 
voting set of 4 loses a processor, it continues as a voting set of 3. If a voting set of three 
loses a processor, it continues as a comparing set of 2. If a comparing set of 2 encounters 
a mis-compare, the computer will be reset and the software will restart. To ensure high 
reliability operation, the voter and memory control logic are implemented in redundant 
radiation-hardened ASIC technology. The four PowerPC voting design allows a reliable 
three processor voting set to be maintained even if one of the PowerPCs permanently 
fails, thus increasing the SBC long-term reliability [1]. 


2 Disagree 



(Figure 3) SBC Processor Fault Detection and Reaction (courtesy of GDAIS) 



CALIPSO flight code uses a polling technique to detect and reset as needed any 
processors that mis-compare. This process significantly reduces and can eliminate system 
resets due to mis-compares. This technique called “Processor Error Scrubbing” will be 
discussed in more detail in the “Processor and Memory Scrubbing” section below. 


Hardware-based Error Detection and Correction (ED AC) 

The GD SBC implements hardware -based EDAC in order to ensure each task 
processes correct data. EDAC runs all of the time on all in-use SDRAM and on the boot 
EEPROM during system initialization. Two additional byte wide columns of RAM 
memory chips are implemented for each 64-bit word. If a memory device fails, this spare 
memory can be used to replace the failed device, enhancing long term reliability. This 
mechanism can even circumvent shorts on data lines [1]. Spare RAM i.e. memory not 
currently being used is not supported by EDAC. During operations errors are corrected as 
they are read but are not corrected in the memory array, i.e. not written back to the 
device. Therefore, bit errors in system and interface memory can build up over time. 
Permanent corrections only occur as part of the CALIPSO flight software Memory Error 
Scrubbing (MES) Task, discussed in more detail in the “Processor and Memory 
Scrubbing” section below. In nonnal memory mode the SBC supports SECDED or 
Single-bit Error Correction/Double-bit Error Detection but CALIPSO uses the optional 
mirrored memory mode whereby the two 64MB banks of RAM store the same image 
thus supporting Triple bit Error Correction and Quadruple bit Error Detection 
(TECQED). The EDAC processing will therefore correct all single, double, and triple-bit 
errors, and will detect all quadruple errors within a single byte. 


Processor and Memory Scrubbing 


The sole purpose of the Memory Error Scrubbing (MES) task is to keep single-bit 
errors from accumulating up to the point where uncorrectable multi-bit errors occur. The 
MES task uses DMA and thus the processor does not have to dedicate resources for this 
task. All of the used system and interface memory are scrubbed every 10 minutes 
(selectable via software table value). Spare system memory is NOT scrubbed. The MES 
task for system memory scrubbing is illustrated in figure 4; interface memory scrubbing 
is similar. The 128MB System Memory is partitioned into two 64MB ha nk s to support 
mirrored mode. The 64MB banks are broken up for scrubbing into 128 512KB blocks. 
The MES Task scrubs both 64MB banks together. Every 10 minutes this DMA process 
starts, and beginning at the top of memory, pulls every 32-bit word across the bus, 
stopping at the bottom of each 512KB block. When the bottom of a block is reached, the 
process checks an error status registers and if set, generates an interrupt. The Interrupt 
Service Routine (ISR) for this corrects the data at the memory address noted in the error 
register. Also, an entry is placed in the Mission Support Software (MSS) error log 



indicating the MES task corrected an error[2]. CALIPSO flight code reads this log at a 
1Hz rate and stores the data for transmission to the ground for analysis. If an error was 
detected/corrected the MES task will re-scrub the same block starting at the address 
immediately following the address that was just corrected. The current version of 
CALIPSO flight code will repeat this “block-retry” process until 12 errors are 
detected/corrected in a block or until no errors are generated. Once the limit is reached or 
no errors are found, the MES task will move on to the next block in the 128-block series. 
All active memories are scrubbed every 10 minutes. 


Scrubber and EE AC 


System SDRAM 




Block < 0 ) 512KB 


1 1 last word 
in block then 
check for 
errors. If 
error was 
generated then 
re-write 
neinory to fix 




Error Registers 



O 

O 

Block (12?) 512KB 


if error was found and error count is loss than rax(12) then scrub 
block again 


ED AC ( Haidvraie -based ) Error 
Detection And Correction 


(Figure 4) CALIPSO System Memory Scrubbing Diagram 



Total SEUs in Time Interval 


1 SDRAM 

2 SDRAMs 

20 SDRAMs 

22 SDRAMs 

Worst (peak) minute 

0.057 

0.11 

1.14 

1.254 

Worst hour 

0.95 

1.89 

18.94 

20.834 

Worst 4.6 hours 

4.36 

8.71 

87.1 

95.81 

Worst 24 hours 

22.7 

45.5 

454.6 

500.06 

Quiet (peak) minute 

0.005 

0.010 

0.101 

0.1111 

Quiet hour (no SAA) 

0.013 

0.027 

0.267 

0.2937 

Quiet 4.6 hours (no SAA) 

0.06 

0.123 

1.227 

1.35 

Quiet 24 hours (no SAA) 

0.26 

0.520 

5.200 

5.72 

(Table 3) CAi 

MPSO Pre-Launch SEU Rate Predictions 



CALI PSO Memory Scrub Events — 7/29/06 (Uptime 87 Days) 



Interface Memory: 32 
System Memory: 461 
Processor Miscompares: 9 

* Scrub Time Denoted by Asterisk 


(Figure 5) Memory Error Scrub Geo-location Map 

The location of all system memory errors and processor mis-compares that the 
CALIPSO scrubbing software has fixed to date are shown in figure 5. The asterisks in the 
plot represent the location of CALIPSO when each error was fixed, as flight software 
scrubs and fixes all memory errors every ten minutes the actual location of CALIPSO 
when the error occurred is not known, therefore the line trailing each asterisk represents 
10 minutes of travel. 


CALISPO pre-launch predicted orbital average SEU rate: 

(5 minutes * 1 peak * 0.1 1 1 1 SEUs/min) + (94 minutes * 0.0049 SEUs/min) = 1.015 
SEUs/orbit or 0.0102/minute 


The CALIPSO SBC currently uses 22 SDRAMS including 20 for system memory and 2 
for interface memory. Table 3 above shows the pre-launch SEU predictions based on 
proton testing at GD and table 4 below shows the MES SEU data acquired from the time 
CALIPSO was first powered on May 4, 2006 through July 29, 2006. 



Size 

SEU Count 

% of Total Scrub Events 

System Memory 

64MB 

461 

93 

Interface Memory 

4MB 

32 

7 

Totals 


493 

100% 


(Table 4) CALIPSO SEU Data from the MES Error Log 


As of July 29, 2006 CALIPSO had been running for 87 days and at 16 orbits per 
day the predictions indicate that the total SEU count should be 87*16*1.015 or 1412. 
CALIPSO is currently experiencing an SEU rate of approximately 0.345/orbit or 0.0034 
per minute, well below predications. Based on the pre-launch SEU predictions CALIPSO 
flight code was configured such that a 60K SEU scrubbing margin would be supported. 


12 SEUs (per block) * 512 (blocks) / 10 (minute scrub period) / 0.0102 = ~ 60K 

With the current SEU rate CALIPSO ’s scrubbing margin is: 

12 SEUs (per block) * 512 (blocks) / 10 (minute scrub period) / 0.0035 = ~175K 

It is expected that CALIPSO will experience this large margin for its entire 
planned 3 year life. 

The purpose of Processor Error Scrubbing (PES) is to prevent multiple CPUs 
from sitting in a disabled state. As noted earlier processors that suffer radiation induced 
upsets and that do not vote with the majority are disabled until explicitly resynchronized. 
Multiple processors in this disabled state will lead to a system reset. CALISPO flight 
software polls the SBC “re-sync pending” register bit at a rate of 1Hz and if indicated 
initiates a processor resynchronization. Per GD engineers “re-syncing” time is 
approximately 1 ms. The pre-launch predictions indicated that CALIPSO may see 3 
processor mis-compares every week or every 168 hours. Shown in table 5 are the on-orbit 
mis-compare data acquired for 87 days between May 4, 2006 and July 29 th 2006. 


Processor Number 

Mis-Compare Count 

% of Total 

Processor (0) 

2 

22 

Processor (1) 

3 

33 

Processor (2) 

3 

33 

Processor (3) 

1 

10 

Total 

9 

100 


(Table 5) CALIPSO Processor Mis-Compare Data 




Based on the data to date CALIPSO is experiencing a processor mis-compare rate of 
approximate 0.726 per week, well below the prediction. With this relatively low mis- 
compare rate the CALIPSO re-sync period of 1Hz is more than adequate to prevent most 
if not all system resets due to mis-compared processors. 


Redundant Boot Code 


The non-volatile 128KB EEPROM contains the bootstrap code which is used 
upon power up to initialize, configure, and verify the SBC hardware. These two 128KB 
EEPROM devices store identical boot code images, power to these devices is applied 
only when needed and as a result these chips are powered off the majority of the time. A 
soft reset will automatically result in a switch to the redundant EEPROM. During start-up 
EDAC is performed on the active boot device. All single bit errors are corrected while 
multi-bit errors may result in a watchdog timeout and subsequent soft reset. This Single 
Error Correction/Double Error Detection (SECDED) feature requires an extra byte of 
check data for each 4 byte address, thus the 68KB boot image is 80KB when this 
SECDED information is added. The CALIPSO software team decided that on-orbit 
verification and if necessary update to the boot code would be implemented. While on- 
orbit the boot images are routinely dumped to the ground and verified, if errors are 
observed the original image can be rewritten. If the device itself begins to fail a new 
image can be built that bypasses failed memory addresses. As an operations note, to load 
a new boot image of approximately 80KB requires 15 minutes of spacecraft contact time 
or two nominal contacts. The CALIPSO software teams at NASA Langley and Ball 
Aerospace have verified that they can rebuild from source code a valid boot image. As a 
developers note the GD Refresh Boot Memory (RBM) API was used to support rewriting 
EEPROM. As of 29 July, 2006 the onboard EEPROM devices have been dumped and 
examined three (3) by the operations group at Langley and no errors have been identified. 


Mirrored files 


Application files are stored on redundant 8MB EPROM devices which like the 
boot devices for radiation reasons are powered only when being accessed. These files are 
checked by operations staff on a regular basis via payload command and rewritten as 
necessary. Certain executable image files are mirrored, i.e. stored on both devices for 
added safety. CALIPSO maintains 2 operational images onboard, one on each device, 
and one “maintenance” image. This maintenance image is expected to be used only when 



neither of the operation images will boot, this is the only file that is mirrored on both 
EPROM devices, to date no errors have been detected. 


Conclusions 

Spaced based remote sensing systems that include multiple instruments and/or 
instruments such as lidar generate large volumes of data and require robust real-time 
hardware and software mechanisms and high throughput processors. Due to onboard 
storage restrictions and telemetry downlink limitations these systems must pre-process 
and reduce the data before sending it to the ground. This onboard processing and real- 
time requirement load may mean that newer more powerful processors are needed even 
though acceptable radiation-hardened versions have not yet been released. 

Use of non-radiation hardened systems requires that robust mitigation strategies 
be developed and employed. CALIPSO utilizes several mitigation techniques including: 
Error Detection and Correction (ED AC), memory and processor scrubbing, device, file 
and processor redundancy. CALIPSO is proof that with the right mix of software and 
hardware COTS systems can be used in LEO and used effectively and efficiently. 
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